Search CORE

5 research outputs found

Perfect is the enemy of test oracle

Author: Ibrahimzada Ali Reza
Jabbarvand Reyhaneh
Tekinoglu Dilara
Varli Yigit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2023
Field of study

Automation of test oracles is one of the most challenging facets of software testing, but remains comparatively less addressed compared to automated test input generation. Test oracles rely on a ground-truth that can distinguish between the correct and buggy behavior to determine whether a test fails (detects a bug) or passes. What makes the oracle problem challenging and undecidable is the assumption that the ground-truth should know the exact expected, correct, or buggy behavior. However, we argue that one can still build an accurate oracle without knowing the exact correct or buggy behavior, but how these two might differ. This paper presents SEER, a learning-based approach that in the absence of test assertions or other types of oracle, can determine whether a unit test passes or fails on a given method under test (MUT). To build the ground-truth, SEER jointly embeds unit tests and the implementation of MUTs into a unified vector space, in such a way that the neural representation of tests are similar to that of MUTs they pass on them, but dissimilar to MUTs they fail on them. The classifier built on top of this vector representation serves as the oracle to generate "fail" labels, when test inputs detect a bug in MUT or "pass" labels, otherwise. Our extensive experiments on applying SEER to more than 5K unit tests from a diverse set of open-source Java projects show that the produced oracle is (1) effective in predicting the fail or pass labels, achieving an overall accuracy, precision, recall, and F1 measure of 93%, 86%, 94%, and 90%, (2) generalizable, predicting the labels for the unit test of projects that were not in training or validation set with negligible performance drop, and (3) efficient, detecting the existence of bugs in only 6.5 milliseconds on average.Comment: Published in ESEC/FSE 202

arXiv.org e-Print Archive

White-box Compiler Fuzzing Empowered by Large Language Models

Author: Deng Yinlin
Jabbarvand Reyhaneh
Liu Jiawei
Lu Runyu
Yang Chenyuan
Yao Jiayi
Zhang Lingming
Publication venue
Publication date: 24/10/2023
Field of study

Compiler correctness is crucial, as miscompilation falsifying the program behaviors can lead to serious consequences. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates tests without sufficient understanding of internal compiler behaviors. As such, they often fail to construct programs to exercise conditions of intricate optimizations. Meanwhile, traditional white-box techniques are computationally inapplicable to the giant codebase of compilers. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization. WhiteFox adopts a dual-model framework: (i) an analysis LLM examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) a generation LLM produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are used as feedback to further enhance the test generation on the fly. Our evaluation on four popular compilers shows that WhiteFox can generate high-quality tests to exercise deep optimizations requiring intricate conditions, practicing up to 80 more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80 confirmed as previously unknown and 51 already fixed. Beyond compiler testing, WhiteFox can also be adapted for white-box fuzzing of other complex, real-world software systems in general

arXiv.org e-Print Archive

Recommended from our members

Advancing Energy Testing of Mobile Applications

Author: Jabbarvand Behrouz Reyhaneh
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

The rising popularity of mobile apps deployed on battery-constrained devices has motivated the need for effective and efficient energy-aware testing techniques. However, currently there is a lack of test generation tools for exercising the energy properties of apps. Automated test generation is not useful without tools that help developers to measure the quality of the tests. Additionally, the collection of tests generated for energy testing could be quite large, as it may involve a test suite that covers all the energy-greedy parts of the code under different use-cases. Thereby, there is a need for techniques to manage the size of test suite, while maintaining its effectiveness in revealing energy defects. This research proposes a four-pronged approach to advance energy testing for mobile applications, including techniques for energy-aware test input generation, energy-aware test oracle construction, energy-aware test-suite adequacy assessment, and energy-aware test-suite minimization

eScholarship - University of California

ICSE 2015 SIGSOFT CAPS Report

Author: Reyhaneh Jabbarvand Behrouz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Transforming test suites into croissants

Author: Chen Yang
Jabbarvand Reyhaneh
Marinov Darko
Yıldız Alperen
Publication venue: Association for Computing Machinery
Publication date: 01/07/2023
Field of study

Software developers often rely on regression testing to ensure that recent changes made to the source code do not introduce bugs. Flaky tests, which non-deterministically pass or fail regardless of any change to the code, can negatively impact the effectiveness of the regression testing. While state-of-the-art is advancing the techniques for test-flakiness detection and mitigation, the community is missing a systematic approach for generating high-quality benchmarks of flaky tests to compare the effectiveness of such techniques. Inspired by the power of mutation testing in evaluating the fault-detection ability of tests, this paper proposes Croissant, a framework for injecting flakiness into the test suites to assess the effectiveness of test-flakiness detection tools in finding these tests. Croissant implements 18 flakiness-inducing mutation operators. We designed these operators to allow controlling the non-determinism involved in flakiness, i.e., making many mutants deterministically pass or fail to observe flaky behavior. Our extensive empirical evaluation of Croissant on the test suites of 15 real-world projects confirms the ability of designed mutation operators to generate high-quality mutants, and their effectiveness in challenging test-flakiness detection tools in revealing flaky tests

Sabanci University Research Database